Overview

Welcome to the hands-on, practical learning portion of Lesson 2 where you will create a single page R Markdown document. I made this page to help guide you in the process. The material below contains a series of assignments and challenges designed to get you comfortable utilizing YAML metadata, Markdown syntax, and R code chunks.

The page itself is written almost entirely in R Markdown, meaning there is no extensive use of anything fancy like complicated CSS or HTML. I will introduce some simple HTML code you can use to jazz up your document a little. You should be able to solve any problem I present by digging around in the raw code on GitHub, consulting the Resources page, using one of the targeted search strategies described in the Problem Solving post, and/or posting a question to the Slack channel.

Some Keys to Success

Knit often. Whenever you make a change to the YAML header, add a new code chunk, etc., re-knit (or render) your document. This is very important. Regular knitting allows you to a) see the effects of a change and b) track down (troubleshoot) issues more easily.

Simply hit the knit button or use the shortcut keys—on MacOS Cmd+Shift+K and Windows Ctrl+Shift+K. Learn to love these shortcut keys. They will save you time.

You can choose how your document is previewed using the dropdown menu in the document settings. Preview in Window opens the document in a separate RStudio window and Preview in Viewer Pane lets you see the document in the main RStudio IDE. These are good for quick looks. You should always double-check the actual HTML file because sometimes things look different in RStudio.

Something to consider while you create your page is readability. This not only applies to the final document but also the raw R Markdown code itself. Think about someone digging through your code to figure out how you did something. Or think about yourself coming back to the code after a few years. The better your document is formatted, the easier it will be to understand. In future lessons we will return to some best practices but for now remember, part of what we are doing here is making your science more transparent and reproducible. If your document is confusing to follow then it serves neither of these purposes.

Assignment 1: The Basics

In the first assignment, you will employ some basic techniques to control the look of your document and how R code chunks are rendered. We will cover YAML metadata modifications, setting global chunk options, and writing Markdown content. You will also learn about testing R code first before rendering the final document.

1.1 Make a Document

If you have not already done so, your first task is to create an R Markdown document. Open R Studio and go to File > New File > R Markdown. A window should pop up where you can fill in the details. It is not important what you put here since you can change it at any time. What is important is that Document and HTML are both selected. We will cover other document types in the future. Hit Ok and follow the steps in this graphic. Remember, you do not need to add a file extension when you save the document.


Building your initial R Markdown document.

Building your initial R Markdown document.


Once you have a document built and saved, there should be a .html file in your working directory. Double-click that file—it should open in your default browser. Each time you re-knit the .Rmd file, you can just refresh the browser page or double-click the file again.

1.2 Add Markdown Text

Your first task is to add some content and format the content with Markdown. This doesn’t need to be anything fancy to start. You can either add text as you go or paste a large amount of text in at once. Dealers choice. You can use the Markdown section of Lesson 0 or the Markdown section from the Resources page for reference.

  1. Add headers to give the document structure. Use different header levels.
  2. Add hyperlinks. We will learn about internal links later. For now, just link to outside websites.
  3. Add emphasis formatting like bold, italics, and block text.
  4. Mix and match formatting, like make a hyperlink bold or add italics to block text.
  5. Make a list of items.

1.3 Modify & Test R Code Chunks

Now it is time to get some practice modifying code chunk options so you can gain more control over the behavior of code and result display. If you have your own R code you are more than welcome to use it here. I will use the default code chunks that were added to the .Rmd file. Please see the section on Chunk structure & options from Lesson 2 for more details.

Here are the two default code chunks. As you can see, both have names and the second chunk has a single option.

```{r cars}
summary(cars)
```
```{r pressure, echo=FALSE}
plot(pressure)
```

There are many code chunk options you can control. Which options you use and how you set them will be determined by your needs. Test the behavior of the following options by setting each equal to either TRUE or FALSE. Render the document and see if you can figure out what changed. Each of these has a default value so you may not see a change until you set the alternative value.

  1. echo
  2. collapse
  3. eval
  4. prompt
  5. highlight
  6. include

Next, it is a really good habit to check code chunks as you add them. This will ensure that each chunk works, making it easier to track down problems. If you refer to the first image on this page, you can you have options for Chunk OutputInline and Console. This controls where the output is displayed. Let’s take a quick look at a code chunk in RStudio and see how you test chunks before rendering.

Take a look at the tool bar on the far right. Option 1 is a dropdown menu that gives you an alternative way to set code chunk options. Option 2 will Run all Code Chunks Above meaning that RStudio will run all code chunks above the current chunk but not the current chunk itself. And Option 3 will Run the Current Chunk. Incidentally, if you do not see these options it means something is wrong with the chunk.

Go ahead and run the chunk.

1.4 Modify YAML Metadata

You last task is to modify the YAML header to suit your needs and tastes. I would like you to experiment with different options and settings to see what happens in the final document.

  1. Run ?rmarkdown::html_document or ?html_document in the Console to see the header options for an HTML document.
  2. Add a table of contents and include options that modify the behavior of the table of contents.
  3. Add the option to keep the Markdown document. This will save a .md copy of your file.
  4. Open the .md file in a text editor. This is the output from knitr—after all R code has been processed—and what PanDoc uses to generate an HTML file. Keep this file open as you build your document. Pay attention to how your R code is converted to Markdown syntax.
  5. Change the theme. Options are listed on the Convert to an HTML document help page you opened in RStudio. Try a few options and see what happens.
  6. Change the code highlight option. These too are listed on the help page. Try a few options and see what happens.

Assignment 2: Tables

In this assignment, you will explore different methods of incorporating tables in your document. The choice of method depends on a) the type of data, b) the amount of data, and c) the desired output. I will cover a few tools for creating tables but please note there are many options out there, so look around and let us know if you find a tool you like.

For each example, I will use the mtcars dataset from the datasets package. The mtcars dataset has 32 rows and 11 columns. Feel free to load your own data table or use the mtcars dataset.

Tools

You will use four different tools in this assignment for making tables. Here is a summary table of of each tool.

Table types and recommended uses.
Table type Table size Formatting Options Skill Level
markdown small minimal beginner
rmarkdown::paged_table large minimal beginner
knitr::kable + kableExtra small extensive intermediate
DT + DataTables large extensive advanced


2.1 Markdown

The simplest method of building a table is with Markdown syntax. This is a nice option because you can hard code the table right into the document—no need to install and load libraries or write code chunks—and it is easy to implement. The downside is there is minimal functionality available in a Markdown table.

Markdown does not work well for large tables. So I will first grab a subset of mtcars, specifically the firt 4 rows and 3 columns. In my code chunk I add the chunk option comment="". This prevents Knitr from appending a string (default is ##) to the start of each line of results in the final document.

                mpg cyl disp
Mazda RX4      21.0   6  160
Mazda RX4 Wag  21.0   6  160
Datsun 710     22.8   4  108
Hornet 4 Drive 21.4   6  258


Incidentally, the rsults box above is technically the simplest table you can make, either by calling the data frame mtcars_sub directly or running print.data.frame(mtcars_sub).

Anyway, run this code chunk, copy the results, and make a Markdown table. You can either run the chunk in RStudio without rendering the document (described above) or render the document and copy the results from HTML page. I added a header to the first column. And here is the Markdown table.


Demonstration of the output from pipe_tables Markdown syntax.
model mpg cyl disp
Mazda RX4 21.0 6 160
Mazda RX4 Wag 21.0 6 160
Datsun 710 22.8 4 108
Hornet 4 Drive 21.4 6 258


Since this is a Markdown table, you can add additional Markdown syntax for formatting. See if you can figure out what syntax I added to my table and add some to your table. Also check out the Tables section of PanDoc User’s Guide for other Markdown table options, including how to add a caption. In addition to pipe_tables, you can create multiline_tables grid_tables, and simple_tables.

Something to notice is that the Markdown table spans the entire width of the page—even though it does not need all of that space. As far as I know, there is no way to control this behavior without adding additional HTML formatting.

Recommendation Use Markdown for small, simple tables where styling is not a concern.

2.2 R Markdown Paged Tables

With larger tables, it may not be practical to display the full table inline. So we need a way to shrink a large table so it looks good while still allowing access to the full table.

The next type of table I want you to try are Paged Tables. R Markdown comes with its own built in table function called paged_table. The paged_table function allows pagination of rows and columns making it possible to render a large table in a small space.

It is easy to code paged_table but that ease comes with a small price—limited functionality. Here are the options you do have with paged_table function.

A Markdown table listing the Options for the paged_table function.
Option Description
rows.print Maximum rows to print per page.
max.print Maximum rows in the table (defaults to 1000).
cols.print Maximum columns in the table (defaults to 10).
rownames.print Print row names as part of the table.

A quick side note. You actually need to load the rmarkdown package for paged_table to work. Anyway, this is a good time to return to the first code chunk in your .Rmd file—the chunk called setup that R Markdown added by default.

I like to use this chunk to load all of the packages I need for my document. Using a single chunk for all of my packages helps me keep my document organized. Notice the setup chunk has the option include=FALSE. This prevents the content of the chunk from appearing in the final document, which for me is more stylistically appealing. I can add a sessionInfo() chunk at the end of my document to report all of the packages so this information is available to the reader. We will cover sessionInfo() when we get into more depth on the subject of reproducibly. If you do not want to load the library you can run the command like this: rmarkdown::paged_table().

Ok, back to the table. Now I can create a table of mtcars with the paged_table function and use an option to limit the number of printed rows to 5 for each page. I used echo=FALSE in my code chunk to hide the code :). By now you should know where to look for a solution.


Notice that for each column, the column class is printed below the name (text enclosed in < >). This is irritating and related to printing a table from a built-in data frame. I have no idea how to fix this (within the confines of R Markdown) but I will work on a solution.

Recommendation Use paged_table for large tables where extensive styling is not a concern.

2.3 Kable Tables

Knitr comes with its own tool for rendering simple tables called kable. The documentation for kable can be found here or by running ?knitr::kable() in the Console. By itself, kable comes with almost no options. We can extend its functionality with the kableExtra package and piping syntax from magrittr. The features of kableExtra are extensive and I will only touch on a few here. The documentation for kableExtra can be found here or by running ?kableExtra in the Console after the package has been installed and loaded.

I highly recommend you learn how to use these packages for making tables.

Again, you will need to load kableExtra and either load the knitr package or run the command like this: knitr::kable().

First, let’s look at the default kable table output. We will use the head of the mtcars dataset.


Wow, this table looks terrible.
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1


If we tried to render the entire table by omitting head, we would just get a long, crappy table in our document. Not cool. Lets see if we can jazz this up a bit with kableExtra.

This table looks better
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

Here I used the pipe operator (%>%) to pass the results of kable to the function kable_styling—a part of the kableExtra package. By running kable_styling as is, I am using defaults for all of the options.

The pipe operator is a powerful tool worth learning.

Back to the table. It certainly looks better than the default kable version but we are missing the ability to page the table. I would like you to run these two commands and look at the output.

This command will include the whole table.

kable(mtcars) %>%
  kable_styling()

And this command will transpose the table (swap rows and columns) using the transpose (t) function. Remember, mtcars has 32 rows and 11 columns but when you transpose the table, it has 11 rows and 32 columns. So it is really wide in the transposed state.

kable(head(t(mtcars))) %>%
  kable_styling()

I hope you agree that neither of these tables are acceptable, especially the second one. Unfortunately, the kableExtra package does not come with an option to add pagination. You can however put the table in a fixed-height, fixed-width (or both) box, and make it scrollable. We can do this by using a pipe operator and the scroll_box function. While we are at it, lets also add tweak some options in kable_styling to make a more handsome table.

Scrollable kable table.
Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive Hornet Sportabout Valiant Duster 360 Merc 240D Merc 230 Merc 280 Merc 280C Merc 450SE Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln Continental Chrysler Imperial Fiat 128 Honda Civic Toyota Corolla Toyota Corona Dodge Challenger AMC Javelin Camaro Z28 Pontiac Firebird Fiat X1-9 Porsche 914-2 Lotus Europa Ford Pantera L Ferrari Dino Maserati Bora Volvo 142E
mpg 21.00 21.000 22.80 21.400 18.70 18.10 14.30 24.40 22.80 19.20 17.80 16.40 17.30 15.20 10.40 10.400 14.700 32.40 30.400 33.900 21.500 15.50 15.200 13.30 19.200 27.300 26.00 30.400 15.80 19.70 15.00 21.40
cyl 6.00 6.000 4.00 6.000 8.00 6.00 8.00 4.00 4.00 6.00 6.00 8.00 8.00 8.00 8.00 8.000 8.000 4.00 4.000 4.000 4.000 8.00 8.000 8.00 8.000 4.000 4.00 4.000 8.00 6.00 8.00 4.00
disp 160.00 160.000 108.00 258.000 360.00 225.00 360.00 146.70 140.80 167.60 167.60 275.80 275.80 275.80 472.00 460.000 440.000 78.70 75.700 71.100 120.100 318.00 304.000 350.00 400.000 79.000 120.30 95.100 351.00 145.00 301.00 121.00
hp 110.00 110.000 93.00 110.000 175.00 105.00 245.00 62.00 95.00 123.00 123.00 180.00 180.00 180.00 205.00 215.000 230.000 66.00 52.000 65.000 97.000 150.00 150.000 245.00 175.000 66.000 91.00 113.000 264.00 175.00 335.00 109.00
drat 3.90 3.900 3.85 3.080 3.15 2.76 3.21 3.69 3.92 3.92 3.92 3.07 3.07 3.07 2.93 3.000 3.230 4.08 4.930 4.220 3.700 2.76 3.150 3.73 3.080 4.080 4.43 3.770 4.22 3.62 3.54 4.11
wt 2.62 2.875 2.32 3.215 3.44 3.46 3.57 3.19 3.15 3.44 3.44 4.07 3.73 3.78 5.25 5.424 5.345 2.20 1.615 1.835 2.465 3.52 3.435 3.84 3.845 1.935 2.14 1.513 3.17 2.77 3.57 2.78
qsec 16.46 17.020 18.61 19.440 17.02 20.22 15.84 20.00 22.90 18.30 18.90 17.40 17.60 18.00 17.98 17.820 17.420 19.47 18.520 19.900 20.010 16.87 17.300 15.41 17.050 18.900 16.70 16.900 14.50 15.50 14.60 18.60
vs 0.00 0.000 1.00 1.000 0.00 1.00 0.00 1.00 1.00 1.00 1.00 0.00 0.00 0.00 0.00 0.000 0.000 1.00 1.000 1.000 1.000 0.00 0.000 0.00 0.000 1.000 0.00 1.000 0.00 0.00 0.00 1.00
am 1.00 1.000 1.00 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0.000 1.00 1.000 1.000 0.000 0.00 0.000 0.00 0.000 1.000 1.00 1.000 1.00 1.00 1.00 1.00
gear 4.00 4.000 4.00 3.000 3.00 3.00 3.00 4.00 4.00 4.00 4.00 3.00 3.00 3.00 3.00 3.000 3.000 4.00 4.000 4.000 3.000 3.00 3.000 3.00 3.000 4.000 5.00 5.000 5.00 5.00 5.00 4.00
carb 4.00 4.000 1.00 1.000 2.00 1.00 4.00 2.00 2.00 4.00 4.00 3.00 3.00 3.00 4.00 4.000 4.000 1.00 2.000 1.000 1.000 2.00 2.000 4.00 2.000 1.000 2.00 2.000 4.00 6.00 8.00 2.00


For the scroll_box function I set the width = "100%" rather than specifying a dimension. This ensures the box will always be the width of the page no matter how small the window is.

Even though you cannot make a paged table with kable, there are many styling options available in the kableExtra package that makes this method extremely useful and worth learning. Plus, the code is relatively simple to write.

I want to show you one more feature that is not available the other table methods we cover in this Assignment—floating. Let’s first subset the mtcars dataset so we can make a small table. Next we use the full_width and position options to control the size and position of the table.

Here is our code chunk.

mpg cyl disp hp drat wt
Mazda RX4 21.0 6 160 110 3.90 2.620
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875
Datsun 710 22.8 4 108 93 3.85 2.320
Hornet 4 Drive 21.4 6 258 110 3.08 3.215
Hornet Sportabout 18.7 8 360 175 3.15 3.440
Valiant 18.1 6 225 105 2.76 3.460
Duster 360 14.3 8 360 245 3.21 3.570



Let’s say we have a bunch of text that we want to put side-by-side with this small table. Our subsetted mtcars dataset now has 7 rows and 6 columns. We can make our table smaller by setting full_width = FALSE in kable_styling and float the table by setting position = "float_right"

Please study the extensive options available in kable and kableExtra and create tables that implement some of the options.



Recommendation Use kable + kableExtra for small tables where extensive styling is desired.

2.4 DT Tables

The last option I want to cover for building tables is implemented using the datatable function from the DT package, an interface to the JavaScript library DataTables. To demonstrate the functionality, I will use a larger dataset called USJudgeRatings from the datasets package. USJudgeRatings has 43 rows and 12 columns. This table is too big—horizontally and vertically—to fit on a standard page.

The syntax for the DT::datatable is more complicated than the other methods but that comes with more extensive functionality.

Working with DT::datatable is an advanced level skill. I highly recommend you learn how to use the package, but it will take practice.

Please make sure you are comfortable with the other methods first before trying to use DT::datatable. I promise, if you do not know what you are doing, this package will cause a lot of frustration. That said, I use it all the time because it is awesome.

Moving on. If we run DT on the USJudgeRatings dataset without any options the table will spill off the side of the page. Again, not cool. Try to run this command and see what happens.

datatable(USJudgeRatings)

DT::datatable does not page tables horizontally like the paged_table command does (descibed above). We can set the width of the table and add an option that allows horizontal scrolling. For this we use the options argument. The syntax is to add width = "100%" followed by options = list(), where we put a comma separated list of options. For now, we just include scrollX in our list of options.


Play around with the table a little. As you can see

  • the table now fits in the window,
  • horizontal scrolling in enabled,
  • the page is vertically paged,
  • there is a Show entries dropdown, and
  • there is a Search box.

The Show entries and Search box are added by default. We can decide whether to show these options or not. I will save that for later. For now, I want to leave you with a more stylized DT datatable to give you a sense of the possibilities. Don’t worry so much about the code—pay attention to the functionality.



I added buttons to download the table to different formats, changed the page length to 5, and changed the values in the Show entries dropdown. Play around with the table. There is a lot more to do with this package and we will come back to it often.

Recommendation Use DT::datatable for large tables where extensive styling is desired.

That’s all for this assignment. Next time we will discuss figures and images.

Session Info

## R version 3.6.1 (2019-07-05)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Catalina 10.15.3
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] DT_0.12          rmarkdown_2.1    kableExtra_1.1.0 epuRate_0.1     
## [5] knitr_1.28      
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.3        later_1.0.0       pillar_1.4.3      compiler_3.6.1   
##  [5] highr_0.8         tools_3.6.1       digest_0.6.24     jsonlite_1.6.1   
##  [9] evaluate_0.14     tibble_2.1.3      lifecycle_0.1.0   viridisLite_0.3.0
## [13] pkgconfig_2.0.3   rlang_0.4.4       shiny_1.4.0       rstudioapi_0.11  
## [17] crosstalk_1.0.0   yaml_2.2.1        xfun_0.12         fastmap_1.0.1    
## [21] httr_1.4.1        stringr_1.4.0     xml2_1.2.2        vctrs_0.2.2      
## [25] htmlwidgets_1.5.1 hms_0.5.3         webshot_0.5.2     glue_1.3.1       
## [29] R6_2.4.1          readr_1.3.1       magrittr_1.5      promises_1.1.0   
## [33] scales_1.1.0      htmltools_0.4.0   rvest_0.3.5       xtable_1.8-4     
## [37] mime_0.9          colorspace_1.4-1  httpuv_1.5.2      stringi_1.4.6    
## [41] munsell_0.5.0     crayon_1.3.4